Efficient Representation for Online Suffix Tree Construction

نویسندگان

  • N. Jesper Larsson
  • Kasper Fuglsang
  • Kenneth Karlsson
چکیده

Suffix tree construction algorithms based on suffix links are popular because they are simple to implement, can operate online in linear time, and because the suffix links are often convenient for pattern matching. We present an approach using edge-oriented suffix links, which reduces the number of branch lookup operations (known to be a bottleneck in construction time) with some additional techniques to reduce construction cost. We discuss various effects of our approach and compare it to previous techniques. An experimental evaluation shows that we are able to reduce construction time to around half that of the original algorithm, and about two thirds that of previously known branch-reduced construction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Suffix Tree Construction for Streaming Sequences

In this study, we present an online suffix tree construction approach where multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a s...

متن کامل

Suffix Vector: Space- and Time-Efficient Alternative to Suffix Trees

Suffix trees are versatile data structures that are used for solving many string-matching problems. One of the main arguments against widespread usage of the structure is its space requirement. This paper describes a new structure called suffix vector, which is not only better in terms of storage space but also simpler than the most efficient suffix tree representation known to date. Alternativ...

متن کامل

On-line construction of compact suffix vectors and maximal repeats

A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [12,13,14]. They proposed a linear construction algorithm of an extended suffix vector, then another linear algorithm to transform an extended suffix vector into a more space economical compact suffix vector. We propose an on-line linear algorithm for directly ...

متن کامل

Search-Optimized Persistent Suffix Tree Storage for Biological Applications

The suffix tree is a well known and popular indexing structure for various sequence processing problems arising in biological data management. However, unlike traditional indexing structures, suffix trees are orders of magnitude larger than the underlying data. Moreover, their construction and search algorithms are extremely inefficient when implemented directly on disk. Recently, we have shown...

متن کامل

Efficient Implementation of Lazy Suffix Trees

We present an efficient implementation of a write-only topdown construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014